2019 Novel Coronavirus (2019-nCoV) or Covid-19 is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring.
Individual risk for the disease is dependent on exposure. Covid-19 has now been detected in almost 150 locations internationally, including in the United States.There have been close to 1,70,000 people sickened by COVID-19 and more than 7,000 people have died from the disease—a death toll that has far surpassed that of the severe acute respiratory syndrome (SARS) epidemic that occurred in 2002 and 2003. Officials everywhere have implemented measures to contain the virus, including travel restrictions and quarantines. Based on the circumstances, WHO (World Health Organization) has declared COVID-19 a pandemic (an epidemic that spreads throughout the world).
This study will highlight the different regions affected globally, the number of people affected, deaths, people who have recovered and the common symptoms.The different visualisations will help us understand the spread of the disease, the estimated mortality rate and other statistics. This study aims to create awareness among the general public, which is of extreme importance in this situation and provides several useful insights into the data. This can also be used by researchers, students who are interested in the subject.
The data used for this study has been taken from the links mentioned below. Furthermore, we will access data through FourSquare API interface and arrange them as a dataframe for visualization. One of the datasets consists of data from 22nd Jan, 2020 while the other consists of daily case reports of the number of confirmed, dead and recovered cases. The usage of data is explained in detail in the implementation section.
Links :-
https://github.com/CSSEGISandData/COVID-19
https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
This section will describe the main components of our Study. The steps are as follows:-
Let's get started !
We'll import all the necessary libraries and collect the data used for analysis.
!pip install plotly
!pip install folium
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # plotting library
!python -m pip install --upgrade pip
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import plotly as py #library for data visualization
import plotly.express as px
import plotly.graph_objs as go
#Data as on 15-03-2020
df1 = pd.read_csv("03-15-2020.csv")
df1.head()
df2 = pd.read_csv('covid_19_data.csv')
df2.rename(columns={'ObservationDate':'Date', 'Country/Region':'Country'}, inplace=True)
Earliest cases - Starting from 22nd January,2020
df2.head()
As we know, the earliest cases were mainly in China whereas the latest cases are spread around the globe.
Latest Cases - 15th March,2020
df2.tail()
#Grouping the data countrywise
df_countries = df2.groupby(['Country', 'Date']).max().reset_index().sort_values('Date', ascending=False)
df_countries = df_countries.drop_duplicates(subset = ['Country'])
df_countries = df_countries[df_countries['Confirmed']>0]
df_countries
fig = go.Figure(data=go.Choropleth(
locations = df_countries['Country'],
locationmode = 'country names',
z = df_countries['Confirmed'],
colorscale = 'Reds',
marker_line_color = 'black',
marker_line_width = 0.5,
))
fig.update_layout(
title_text = 'Confirmed Cases as on March 15, 2020',
title_x = 0.5,
geo=dict(
showframe = False,
showcoastlines = False,
projection_type = 'equirectangular'
)
)
As we can see, the highest number of confirmed cases (as on March 15, 2020), has been recorded in China, followed by Italy and then Iran. US, Brazil, Australia and other countries also have a considerable number of confirmed cases.
df_countrydate = df2[df2['Confirmed']>0]
df_countrydate = df_countrydate.groupby(['Date','Country']).sum().reset_index()
df_countrydate
fig = px.choropleth(df_countrydate,
locations="Country",
locationmode = "country names",
color="Confirmed",
hover_name="Country",
animation_frame="Date"
)
fig.update_layout(
title_text = 'Spread of Coronavirus',
title_x = 0.5,
geo=dict(
showframe = False,
showcoastlines = False,
))
fig.show()
This is an animated choropleth which shows how coronavirus has been been spreading, starting from 22nd January to 15th March, 2020. We can see that the spread began in China (as on 22nd Jan) and US, South Korea, Japan having one or two cases each and then gradually it spread to different parts of the world.
fig = px.pie(df_countries, values = 'Confirmed',names='Country', height=600)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(
title_x = 0.5,
geo=dict(
showframe = False,
showcoastlines = False,
))
fig.show()
It’s clear that the majority of confirmed cases remain in China. However, there has been a gargantuan increase in number of cases outside China, sum of which is more than that in China.
confirmed = df2.groupby('Date').sum()['Confirmed'].reset_index()
deaths = df2.groupby('Date').sum()['Deaths'].reset_index()
recovered = df2.groupby('Date').sum()['Recovered'].reset_index()
fig = go.Figure()
fig.add_trace(go.Bar(x=confirmed['Date'],
y=confirmed['Confirmed'],
name='Confirmed',
marker_color='blue'
))
fig.add_trace(go.Bar(x=deaths['Date'],
y=deaths['Deaths'],
name='Deaths',
marker_color='Red'
))
fig.add_trace(go.Bar(x=recovered['Date'],
y=recovered['Recovered'],
name='Recovered',
marker_color='Green'
))
fig.update_layout(
title='Worldwide CoronaVirus Cases - Confirmed, Deaths, Recovered (Bar Chart)',
xaxis_tickfont_size=14,
yaxis=dict(
title='Number of Cases',
titlefont_size=16,
tickfont_size=14
),
legend=dict(
x=0,
y=1.0,
bgcolor='rgba(233, 233, 233, 0)',
bordercolor='rgba(255, 255, 255, 0)'
),
barmode='group',
bargap=0.15, # gap between bars of adjacent location coordinates.
bargroupgap=0.1 # gap between bars of the same location coordinate.
)
fig.show()
The bar chart shows confirmed, deaths, and recovered cases worldwide. We can see a spike in the number of confirmed cases on 13th February due to a new method of reclassifying confirmed cases, which is when Wuhan updated their curfew to a full lockdown. It started to plateau, but because of the release of the members on the Diamond Cruise Ship, we can see an increasing number of countries with cases towards the end of February. Since then, the numbers have inceased exponentially, especially outide China.
Now let's study these graphs individually.
bar_data = df2.groupby(['Country', 'Date'])['Confirmed', 'Deaths', 'Recovered'].sum().reset_index().sort_values('Date', ascending=True)
fig = px.bar(bar_data, x="Date", y="Confirmed", color='Country', text = 'Confirmed', orientation='v', height=600,
title='Confirmed')
fig.show()
There is no doubt that China has the largest number of cases. However, we can observe that the in case of Mainland China, the curve has flattened in late February and March, to a great extent. The increase is no longer exponential, which is a positive sign and also shows China's competency in dealing with this pandemic. On the other hand, there has been an exponential increase in confirmed cases outside China, especially in Italy.
fig = px.bar(bar_data, x="Date", y="Deaths", color='Country', text = 'Deaths', orientation='v', height=600,
title='Deaths')
fig.show()
Similarly, death cases have also seen a gradual increase. The curve has flattened in China but is still exponential outside China,especially in Italy and Iran.
fig = px.bar(bar_data, x="Date", y="Recovered", color='Country', text = 'Recovered', orientation='v', height=600,
title='Recovered')
fig.show()
The bar chart is an indicator of the exponential increase in the number of recovered cases. The aim is to not let this graph flatten with time, till all the cases have recovered.
line_data = df2.groupby('Date').sum().reset_index()
line_data = line_data.melt(id_vars='Date',
value_vars=['Confirmed',
'Recovered',
'Deaths'],
var_name='Ratio',
value_name='Value')
fig = px.line(line_data, x="Date", y="Value", color='Ratio',
title='Confirmed cases, Recovered cases, and Deaths Over Time')
fig.show()
This is a consolidated representation of the bar graphs shown above. Ideally, we would like to see the red line and blue line converge and pass each other.
For latest Date
df1.rename(columns={'Country/Region':'Country'}, inplace=True)
confirmed = df1.groupby(['Last Update', 'Country']).sum()[['Confirmed']].reset_index()
deaths = df1.groupby(['Last Update', 'Country']).sum()[['Deaths']].reset_index()
recovered = df1.groupby(['Last Update', 'Country']).sum()[['Recovered']].reset_index()
all_countries = confirmed['Country'].unique()
print("Number of countries/regions with cases: " + str(len(all_countries)))
print("Countries/Regions with cases: ")
for i in all_countries:
print(" " + str(i))
line_data = df1.groupby('Country').sum().reset_index()
line_data = line_data.melt(id_vars='Country',
value_vars=['Confirmed',
'Recovered',
'Deaths'],
var_name='Ratio',
value_name='Value')
fig = px.line(line_data, x="Country", y="Value", color='Ratio',
title='Confirmed cases, Recovered cases, and Deaths Over Time')
fig.show()
The line chart shows the number of confirmed, recovered and death cases across the globe, as on 15th March,2020. The mortality rate is around 4% in China whereas it is around 2.5% in Iran. This is less than that of SARS (10%) and MERS. Most of the casualties are senior citizens with underlying health problems.
Now, we'll focus on the the cases in China, where it was first detected.
df3 = df1[df1['Country']=='China']
df3
line_data = df3.groupby('Province/State').sum().reset_index()
line_data = line_data.melt(id_vars='Province/State',
value_vars=['Confirmed',
'Recovered',
'Deaths'],
var_name='Ratio',
value_name='Value')
fig = px.line(line_data, x="Province/State", y="Value", color='Ratio',
title='Confirmed cases, Recovered cases, and Deaths Over Time')
fig.show()
As we know, Wuhan, capital of Hubei has been the epicenter of the disease with almost 67,794 confirmed cases as on 15th March, with around 80% cases of recovery.
address = 'China'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of the City of China are {}, {}.'.format(latitude, longitude))
# create map of world using latitude and longitude values
map_world = folium.Map(location=[latitude, longitude], zoom_start=2)
# add markers to map
for lat, lng, province in zip(df1['Latitude'], df1['Longitude'], df1['Province/State']):
label = '{}'.format(province)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=3,
popup=label,
color='green',
fill=True,
fill_color='#3199cc',
fill_opacity=0.3,
parse_html=False).add_to(map_world)
map_world
# create map of China using latitude and longitude values
map_china = folium.Map(location=[latitude, longitude], zoom_start=2)
# add markers to map
for lat, lng, province in zip(df3['Latitude'], df3['Longitude'], df3['Province/State']):
label = '{}'.format(province)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=3,
popup=label,
color='green',
fill=True,
fill_color='#3199cc',
fill_opacity=0.3,
parse_html=False).add_to(map_china)
map_china
The maps show the areas affected worldwide and then focusing on China, respectively.
We have a list of all the provinces in China where people have been affected by Covid-19.
Using Foursquare API, we will search for hospitals within 5 km of these areas, so as to see the distribution of hospitals in these provinces. This distribution plays an important role in determining accessibility to healthcare facilities in case of a health disaster like this.
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
VERSION = '20180604'
LIMIT = 100
#These details have been removed
CLIENT_ID = "" # your Foursquare ID
CLIENT_SECRET = "" # your Foursquare Secret
df4 = pd.DataFrame()
for i in df3['Province/State']:
address = i
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
#print(latitude, longitude)
search_query = 'Hospitals'
radius = 5000
#print(search_query + ' .... OK!')
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
df4 = df4.append(json_normalize(venues))
df4
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df4.columns if col.startswith('location.')] + ['id']
dataframe_filtered = df4.loc[:, filtered_columns]
# function that extracts the category of the venue
def get_category_type(row):
try:
categories_list = row['categories']
except:
categories_list = row['venue.categories']
if len(categories_list) == 0:
return None
else:
return categories_list[0]['name']
# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
dataframe_filtered
hospitals_map = folium.Map(location=[35.000074,104.999927], zoom_start=3) # generate map centred in China
# add the hospitals as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
folium.CircleMarker(
[lat, lng],
radius=5,
color='blue',
popup=label,
fill = True,
fill_color='blue',
fill_opacity=0.6
).add_to(hospitals_map)
# display map
hospitals_map
So on plotting the dataframe with hospital related information, we can see only 3 hospitals on the map. Foursquare API had returned some more venues which were filtered out based on the category.
Based on visualization of our data, we can say that Covid-19, which was first detected in China has now become a pandemic because of the way it has spread across the globe. Numbers have somewhat stabilized in China. However, the cases are growing at an alarming rate outside China, especially in Italy and Iran.Italy has the most recorded cases outside China. Using Foursquare API and Folium library, we have visualized the distribution of hospitals in different Covid-hit provinces of China. Such visualizations can be used to compare the health facilities of different countries to see how well prepared they are, in case of a pandemic like this. The mortality rate of Covid-19 has been less than SARS, MERS so far and close to that of Spanish Flu. However, the large number of confirmed cases proves that it spreads very easily.
With no vaccine or treatment that can prevent it yet, containing the spread of Covid-19 is vital. As a final note, all of the above analysis is dependent on the adequacy and accuracy of the data collected from various sources and Foursquare API. A more comprehensive analysis and future work would need incorporation of data from other external databases.